Calculating hydrogen bonds: the basics

We will find the hydrogen bonds in a box of water in order to demonstrate the basic usage of HydrogenBondAnaysis.

Last updated: February 3, 2021 with MDAnalysis 2.0.0-dev

Minimum version of MDAnalysis: 2.0.0-dev0

Packages required:

See also

Note

Please cite Smith et al. (2018) when using HydrogenBondAnaysis in published work.

[1]:
import pickle
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import waterPSF, waterDCD
from MDAnalysis.analysis.hydrogenbonds import HydrogenBondAnalysis

# the next line is necessary to display plots in Jupyter
%matplotlib inline

Loading files

We will load a small water-only systems containing 5 water molecules and 10 frames then find the hydrogen bonds present at each frame.

[2]:
u = mda.Universe(waterPSF, waterDCD)

Warning

It is highly recommended that a topology with bond information is used (e.g. PSF, TPR, PRMTOP), as this is the only way to guarantee the correct identification of donor-hydrogen pairs.

Hydrogen bonds

In molecular dynamics simulations, hydrogen bonds are typically identified via the following geometric criteria:

  1. the donor-acceptor distance (\(r_{DA}\)) must be less than a specified cutoff, typically 3 Å

  2. the donor-hydrogen-acceptor angle (\(\theta_{DHA}\)) must be greater than a specified cutoff, typically 150°

65a4b55f23ae46b1ba97023329518ded

Find water-water hydrogen bonds

Basic usage of HydrogenBondAnalysis involvees passing an acceptor atom selection (acceptors_sel) and a hydrogen atom selection (hydrogens_sel) to HydrogenBondAnalysis. By not providing a donor atom selection (donor_sel) we will use the topology bond information to find donor-hydrogen pairs.

[3]:
hbonds = HydrogenBondAnalysis(
    universe=u,
    donors_sel=None,
    hydrogens_sel="name H1 H2",
    acceptors_sel="name OH2",
    d_a_cutoff=3.0,
    d_h_a_angle_cutoff=150,
    update_selections=False
)

Note

For a small performance boost, update_selections can always be set to False unless the given selections will result in an AtomGroup that changes across frames, e.g if acceptors_sel="name OH2 and around 3.0 protein" then update_selections must be True

If we do not set the start, stop, and step for frames to analyse, all frames will be used.

[4]:
hbonds.run(
    start=None,
    stop=None,
    step=None,
    verbose=True
)

The results are stored as a numpy array in the hbonds.hbonds attribute

Each row of the results array contains information on a single hydrogen bond

[frame, donor_index, hydrogen_index, acceptor_index, DA_distance, DHA_angle]

For clarity, below we define constants that will be used to access the relevant column of each hbond row

[5]:
FRAME = 0
DONOR = 1
HYDROGEN = 2
ACCEPTOR = 3
DISTANCE = 4
ANGLE = 5

Helper functions

The are three helper functions that can be used to quickly post-process the results.

1. Count by time. Counts the total number of hydrogen bonds at each frame

[6]:
plt.plot(hbonds.times, hbonds.count_by_time(), lw=2)

plt.title("Number of hydrogeon bonds over time", weight="bold")
plt.xlabel("Time (ps)")
plt.ylabel(r"$N_{HB}$")

plt.show()

../../../_images/examples_analysis_hydrogen_bonds_hbonds_19_0.png

2. Count by type. Counts the total number of each type of hydrogen bond.

A type is a unique combination of donor residue name, donor atom type, acceptor residue name, and acceptor atom type.

[7]:
hbonds.count_by_type()
[7]:
array([['TIP3:OT', 'TIP3:OT', '27']], dtype='<U7')

Each row contains a unique hydrogen bond type. The three columns correspond to:

  1. the donor resname and donor name

  2. the acceptor resname and acceptor name

  3. the total count

The average number of each type of hydrogen bond formed at each frame is likely more informative than the total number over the trajectory. This can be calculated for each hydrogen bond type as follows:

[8]:
n_frames = hbonds.frames.size
for donor, acceptor, count in hbonds.count_by_type():

    donor_resname, donor_type = donor.split(":")
    n_donors = u.select_atoms(f"resname {donor_resname} and type {donor_type}").n_atoms

    # average number of hbonds per donor molecule per frame
    mean_count = int(count) / (n_frames * n_donors)
    print(f"{donor} to {acceptor}: {mean_count:.2f}")

TIP3:OT to TIP3:OT: 0.54

Note

In a water-only system the average number of hyrogen bonds per water molecule should be around 3.3, depending on temperature and forcefield. We don’t see this due to the small system size (15 water molecules).

3. Count by ids. Counts the total number of each hydrogen bond formed between specific atoms.

[9]:
hbonds.count_by_ids()
[9]:
array([[12, 14,  9, 10],
       [ 9, 10,  3,  9],
       [ 3,  5,  0,  4],
       [ 0,  1,  6,  3],
       [ 6,  7, 12,  1]])

Each row contains a unique hydrogen bond. The four columns correspond to:

  1. the donor atom index

  2. the hydrogen atom index

  3. the acceptor atom index

  4. the total count

The array is sorted in descending total count. You can check which atoms are involved in the most frequently observed hydrogen bond:

[10]:
counts = hbonds.count_by_ids()
most_common = counts[0]

u.atoms[most_common[DONOR]], u.atoms[most_common[HYDROGEN]], u.atoms[most_common[ACCEPTOR]]

[10]:
(<Atom 15: H2 of type HT of resname TIP3, resid 21 and segid WAT>,
 <Atom 10: OH2 of type OT of resname TIP3, resid 15 and segid WAT>,
 <Atom 11: H1 of type HT of resname TIP3, resid 15 and segid WAT>)

Further analysis

There are many different analyses you may want to perform after finding hydrogen bonds. Below we will calculate the mean number of hydrogen bonds as a function of \(z\) position of the donor atom.

[11]:
# bins in z for the histogram
bin_edges = np.linspace(-25, 25, 51)
bin_centers = bin_edges[:-1] + 0.5

# results array
counts = np.full(bin_centers.size, fill_value=0.0)

for ts in u.trajectory[hbonds.frames]:

    donors = u.atoms[hbonds.hbonds[hbonds.hbonds[:, FRAME] == ts.frame, DONOR].astype(int)]
    zpos = donors.positions[:,2]
    hist, *_ = np.histogram(zpos, bins=bin_edges)
    counts += hist

counts /= hbonds.frames.size

[12]:
plt.plot(bin_centers, counts, lw=2)

plt.title(r"Number of hydrogen bonds as a funcion of height in $z$", weight="bold")
plt.xlabel(r"$z\ \rm (\AA)$")
plt.ylabel(r"$N_{HB}$")

plt.show()

../../../_images/examples_analysis_hydrogen_bonds_hbonds_32_0.png

You may wish to find the number of hydrogen bonds in \(z\) per unit area in \(xy\), rather than the total number a each \(z\) position. To do this, simply divide counts by the mean area in \(xy\):

[ ]:
mean_xy_area = np.mean([np.product(ts.dimensions[:2]) for ts in u.trajectory[hbonds.frames]])
counts /= mean_xy_area

Note

It is likely you will want to calculate the distance from an interface, such as a lipid membrane, rather than the absolute position in \(z\). In this case:

zpos = donors.positions[:,2] - interface_zpos

Store data

There are various ways of storing the results from the analysis.

  1. You can persist the HydrogenBondsAnalysis object using pickle (or joblib)

[13]:
with open("data/hbonds.pkl", 'wb') as f:
    pickle.dump(hbonds, f)
[14]:
with open("data/hbonds.pkl", 'rb') as f:
    hbonds = pickle.load(f)
  1. You can extract the results and store the numpy array

[15]:
np.save("data/hbonds.npy", hbonds.hbonds)
  1. You can create a Pandas DataFrame with more information about hydrogen bonding atoms, which may make some further analysis easier.

[16]:
frame, donor_indices, hydrogen_indices, acceptor_indices = hbonds.hbonds[:, :DISTANCE].T.astype(int)
distances, angles = hbonds.hbonds[:, DISTANCE:].T

df = pd.DataFrame(

    data = np.array([
        frame,
        donor_indices,
        hydrogen_indices,
        acceptor_indices,
        u.atoms[donor_indices].resnames,
        u.atoms[donor_indices].resids,
        u.atoms[donor_indices].names,
        u.atoms[hydrogen_indices].names,
        u.atoms[acceptor_indices].resnames,
        u.atoms[acceptor_indices].resids,
        u.atoms[acceptor_indices].names,
        distances,
        angles
    ]).T,

    columns=[
        "Frame",
        "Donor_ix",
        "Hydrogen_ix",
        "Acceptor_ix",
        "Donor_resname",
        "Donor_resid",
        "Donor_name",
        "Hydrogen_name",
        "Acceptor_resname",
        "Accetpor_resid",
        "Accetpor_name",
        "Distance",
        "Angle"
    ]

)

[17]:
df.to_csv("data/hbonds.csv", index=False)

References

[1] Richard J. Gowers, Max Linke, Jonathan Barnoud, Tyler J. E. Reddy, Manuel N. Melo, Sean L. Seyler, Jan Domański, David L. Dotson, Sébastien Buchoux, Ian M. Kenney, and Oliver Beckstein. MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations. Proceedings of the 15th Python in Science Conference, pages 98–105, 2016. 00152. URL: https://conference.scipy.org/proceedings/scipy2016/oliver_beckstein.html, doi:10.25080/Majora-629e541a-00e.

[2] Naveen Michaud-Agrawal, Elizabeth J. Denning, Thomas B. Woolf, and Oliver Beckstein. MDAnalysis: A toolkit for the analysis of molecular dynamics simulations. Journal of Computational Chemistry, 32(10):2319–2327, July 2011. 00778. URL: http://doi.wiley.com/10.1002/jcc.21787, doi:10.1002/jcc.21787.

[3] Paul Smith, Robert M. Ziolek, Elena Gazzarrini, Dylan M. Owen, and Christian D. Lorenz. On the interaction of hyaluronic acid with synovial fluid lipid membranes. Phys. Chem. Chem. Phys., 21(19):9845-9857, 2018. URL: http://dx.doi.org/10.1039/C9CP01532A